156 research outputs found

    Network Recasting: A Universal Method for Network Architecture Transformation

    Full text link
    This paper proposes network recasting as a general method for network architecture transformation. The primary goal of this method is to accelerate the inference process through the transformation, but there can be many other practical applications. The method is based on block-wise recasting; it recasts each source block in a pre-trained teacher network to a target block in a student network. For the recasting, a target block is trained such that its output activation approximates that of the source block. Such a block-by-block recasting in a sequential manner transforms the network architecture while preserving the accuracy. This method can be used to transform an arbitrary teacher network type to an arbitrary student network type. It can even generate a mixed-architecture network that consists of two or more types of block. The network recasting can generate a network with fewer parameters and/or activations, which reduce the inference time significantly. Naturally, it can be used for network compression by recasting a trained network into a smaller network of the same type. Our experiments show that it outperforms previous compression approaches in terms of actual speedup on a GPU.Comment: AAAI 2019 Oral presentation, source codes are available on github: https://github.com/joonsang-yu/Network-Recastin

    Retrospective: A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing

    Full text link
    Our ISCA 2015 paper provides a new programmable processing-in-memory (PIM) architecture and system design that can accelerate key data-intensive applications, with a focus on graph processing workloads. Our major idea was to completely rethink the system, including the programming model, data partitioning mechanisms, system support, instruction set architecture, along with near-memory execution units and their communication architecture, such that an important workload can be accelerated at a maximum level using a distributed system of well-connected near-memory accelerators. We built our accelerator system, Tesseract, using 3D-stacked memories with logic layers, where each logic layer contains general-purpose processing cores and cores communicate with each other using a message-passing programming model. Cores could be specialized for graph processing (or any other application to be accelerated). To our knowledge, our paper was the first to completely design a near-memory accelerator system from scratch such that it is both generally programmable and specifically customizable to accelerate important applications, with a case study on major graph processing workloads. Ensuing work in academia and industry showed that similar approaches to system design can greatly benefit both graph processing workloads and other applications, such as machine learning, for which ideas from Tesseract seem to have been influential. This short retrospective provides a brief analysis of our ISCA 2015 paper and its impact. We briefly describe the major ideas and contributions of the work, discuss later works that built on it or were influenced by it, and make some educated guesses on what the future may bring on PIM and accelerator systems.Comment: Selected to the 50th Anniversary of ISCA (ACM/IEEE International Symposium on Computer Architecture), Commemorative Issue, 202

    Worst Case Execution Time Analysis for Synthesized Hardware

    Get PDF
    Abstract -We propose a hardware performance estimation flow for fast design space exploration, based on worst-case execution time analysis algorithms for software analysis. Test cases on some real-world applications show that our flow provides a tight upper bound of the execution time, and many useful hints to the designer

    Partial Bus-Invert Coding for Power Optimization of Application-Specific Systems

    Get PDF
    This paper presents two bus coding schemes for power optimization of application-specific systems: Partial Bus-Invert coding and its extension to Multiway Partial Bus-Invert coding. In the first scheme, only a selected subgroup of bus lines is encoded to avoid unnecessary inversion of relatively inactive and/or uncorrelated bus lines which are not included in the subgroup. In the extended scheme, we partition a bus into multiple subbuses by clustering highly correlated bus lines and then encode each subbus independently. We describe a heuristic algorithm of partitioning a bus into subbuses for each encoding scheme. Experimental results for various examples indicate that both encoding schemes are highly efficient for application-specific systems

    Continuum-based design sensitivity analysis and optimization of nonlinear shell structures using meshfree method

    Get PDF
    A continuum-based shape and configuration design sensitivity analysis (DSA) method for a finite deformation elastoplastic shell structure has been developed. Shell elastoplasticity is treated using the projection method that performs the return mapping on the subspace defined by the zero-normal stress condition. An incrementally objective integration scheme is used in the context of finite deformation shell analysis, wherein the stress objectivity is preserved for finite rotation increments. The material derivative concept is used to develop a continuum-based shape and configuration DSA method. Significant computational efficiency is obtained by solving the design sensitivity equation without iteration at each converged load step using the same consistent tangent stiffness matrix. Numerical implementation of the proposed shape and configuration DSA is carried out using the meshfree method. The accuracy and efficiency of the proposed method is illustrated using numerical examples

    DESIGN SENSITIVITY ANALAYSIS OF NONLINEAR SHELL STRUCTURE WITH FRICTIONLESS CONTACT

    Get PDF
    A continuum-based shape and configuration design sensitivity analysis method for a finite deformation elastoplastic shell structure with frictionless contact has been developed. Shell elastoplasticity is treated based on the projection method that performs the return mapping on the subspace defined by the zero-normal stress condition. An incrementally objective integration scheme is used in the context of finite deformation shell analysis, wherein stress objectivity is preserved for finite rotation increments. The penalty regularization method is used to approximate the contact variational inequality. The material derivative concept is used to develop continuum based design sensitivity. The design sensitivity equation is solved without iteration at each converged load step. Numerical implementation of the proposed shape and configuration design sensitivity analysis is carried out using the meshfree method. The accuracy and efficiency of the proposed method is illustrated using numerical examples
    • โ€ฆ
    corecore